-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make VideoMAEImageProcessor much faster #28221
Conversation
Creating a tensor from a list of numpy.ndarrays is extremely slow. This small change makes the preprocessing step much faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ikergarcia1996 - thanks for opening this PR and for the detail write-up! And apologies for the delay in review.
Could you extend this logic to the other image processors in the library?
Co-authored-by: amyeroberts <[email protected]>
@amyeroberts The suggestion makes sense, although we also need to define Regarding extending this logic to other image processors. I think that the root of the issue is here: transformers/src/transformers/feature_extraction_utils.py Lines 138 to 141 in 4a66c0d
The def recursive_ndarray_check(value):
if isinstance(value, (list, tuple)) and len(value) > 0:
return recursive_ndarray_check(value[0])
return isinstance(value, np.ndarray)
if recursive_ndarray_check(value):
value = np.array(value)
return torch.tensor(value) |
@ikergarcia1996 Thanks for the detailed explanation! In principle, I'd be pro adding in the recursive check. However, there's other models e.g. audio models which rely on this logic which might also be affected by this change so we'd have to make sure it's well tested for these models too. Let's add it now, we can iron out any issues it might flag for the vision models and then I can ask the audio team if they think they're be any issues. |
Hi @ikergarcia1996, are you still working on this? It would be great to have this contribution! The next steps would be adding in the recursive logic you proposed. |
Sorry, @amyeroberts, I had other urgent matters to attend to and forgot about this. I have updated the code; it works for VideoMAEImageProcessor, although I am not sure if it may cause issues with other models. The tests are failing, but the error seems unrelated to the changes in the code.
|
@ikergarcia1996 Thanks for updating! The natten issues aren't related to this PR - we recently had issues on our CI runs because of recent package releases and incompatible versions. A fix has been pushed to the |
Rebase fork
Merge pull request #1 from huggingface/main
@amyeroberts I have updated my branch, but the tests still fail. |
@ikergarcia1996 Apolgies. There's been some continued issues with handling compatibility between packages. A final fix should have been merged into main now. Could you try rebasing again? |
@amyeroberts done 😃 |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on this!
Just a few small comments on what's there. Only thing that needs to be added are some tests to make sure that the inputs and outputs are handled as expected. In particular, it would be good to test for a vision model which outputs more than just pixel_values
, for example annotations for DETR
.
# Speeds up tensor conversion - see: https://github.com/huggingface/transformers/pull/28221/files | ||
data = {"pixel_values": np.asarray(videos) if return_tensors == "pt" else videos} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, although it doesn't hurt either. Should we remove it?
@ikergarcia1996 Any update on this? It would be great to have this included in the library! |
Co-authored-by: amyeroberts <[email protected]>
Hi @amyeroberts! |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
Currently,
VideoMAEImageProcessor
is extremely slow. In fact, during inference, it takes longer to preprocess a video than to run the model. After investigating the code, I discovered that this issue can be easily fixed.Currently, the
preprocess()
function inVideoMAEImageProcessor
creates alist of list of ndarrays
(asself._preprocess_image()
returns an ndarray), which is then sent toBatchFeature
to be converted into atorch.tensor
. The issue arises because creating a tensor from a list of ndarrays is extremely slow. Additional information on this problem can be found here: pytorch/pytorch#13918.By converting the list of ndarrays into a single ndarray, a significant speedup can be achieved. Here is a minimal example for demonstration.
We create two processors: one using the current code and another with the modified code.
We then create a video of 128 frammes with a 200x200 resolution
I have run both processors in a jupyter notebook
Just to be sure than nothing changes
With this small change, we reduce the video preprocessing time from 1.11 seconds to 154 ms 😃
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@amyeroberts